CHAPTER 21 Summarizing and Graphing Survival Data 305
So, how do you analyze survival data containing censoring? The following sections
explain the correct ways to proceed as well as mistakes to avoid.
Analyzing censored data properly
Statisticians have developed techniques to utilize the partial information con-
tained in censored observations. We describe two of the most popular techniques
later in this chapter, which are the life-table method and the Kaplan-Meier (K-M)
method. To understand these methods, you need to first understand two funda-
mental concepts — hazard and survival:»
» The hazard rate is the probability of the participant dying in the next small
interval of time, assuming the participant is alive right now.»
» The survival rate is the probability of the participant living for a certain
amount of time after some starting time point.
The first task when analyzing survival data is usually to describe how the hazard
and survival rates vary with time. In this chapter, we show you how to estimate
the hazard and survival rates, summarize them as tables, and display them as
graphs. Most of the larger statistical packages (such as those described in
Chapter 4) allow you to do the calculations we describe automatically, so you may
never have to do them manually. But without first understanding how these
methods work, it’s almost impossible to understand any other aspect of survival
analysis, so we provide a demonstration for instructional purposes.
Making mistakes with censored data
Here are two mistakes you need to avoid when working with survival data:»
» You shouldn’t exclude participants with a censored survival time from any
survival analysis!»
» You shouldn’t substitute the censored date with some other value, which is
called imputing. When you impute numerical data to replace a missing value, it
is common to use the last observed value for that participant (called last
observation carried forward, or LOCF, imputation). However, you should not
impute dates in survival analysis.
Exclusion and imputation don’t work to fix the missingness in censored data. You
can see why in Figure 21-2, where we’ve slid the timelines for all the participants
over to the left as if they all had their surgery on the same date. The time scale
shows survival time in years after surgery instead of chronological time.